Efficient Algorithms for Discovering Frequent and Maximal Substructures from Large Semistructured Data

نویسنده

Hiroki Arimura

چکیده

In this paper, we review recent advances in efficient algorithms for semi-structured data mining , that is, discovery of rules and patterns from structured data such as sets, sequences, trees, and graphs. After introducing basic definitions and problems, We present efficent algorithms for frequent and maximal pattern mining for classes of sets, sequences, and trees. In particular, we explain general techniques, called the rightmost expansion and PPC-extension, which are powerful tools for designing efficient algorithms. We also give examples of applications of semi-structured data mining to real world data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discovering Frequent Substructures in Large Unordered Trees

In this paper, we study a frequent substructure discovery problem in semi-structured data. We present an efficient algorithm Unot that computes all frequent labeled unordered trees appearing in a large collection of data trees with frequency above a user-specified threshold. The keys of the algorithm are efficient enumeration of all unordered trees in canonical form and incremental computation ...

متن کامل

FRACTURE mining: Mining frequently and concurrently mutating structures from historical XML documents

In the past few years, the fast proliferation of available XML documents has stimulated a great deal of interest in discovering hidden and nontrivial knowledge from XML repositories. However, to the best of our knowledge, none of existing work on XML mining has taken into account the dynamic nature of XML documents as online information. The present article proposes a novel type of frequent pat...

متن کامل

A comprehensive method for discovering the maximal frequent set

The association rule mining can be divided into two steps.The first step is to find out all frequent itemsets, whose occurrences are greater than or equal to the user-specified threshold.The second step is to generate reliable association rules based on all frequent itemsets found in the first step. Identifying all frequent itemsets in a large database dominates the overall performance in the a...

متن کامل

YAFIMA: Yet Another Frequent Itemset Mining Algorithm

Efficient discovery of frequent patterns from large databases is an active research area in data mining with broad applications in industry and deep implications in many areas of data mining. Although many efficient frequent-pattern mining techniques have been developed in the last decade, most of them assume relatively small databases, leaving extremely large but realistic datasets out of reac...

متن کامل

Knowledge Discovery for Sensor Network Comprehension

During the past decade, we have witnessed an explosive growth in our capabilities to both generate and collect data. Various data mining techniques have been proposed and widely employed to discover valid, novel and potentially useful patterns in these data. Data mining involves the discovery of patterns, associations, changes, anomalies, and statistically significant structures and events in h...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Efficient Algorithms for Discovering Frequent and Maximal Substructures from Large Semistructured Data

نویسنده

چکیده

منابع مشابه

Discovering Frequent Substructures in Large Unordered Trees

FRACTURE mining: Mining frequently and concurrently mutating structures from historical XML documents

A comprehensive method for discovering the maximal frequent set

YAFIMA: Yet Another Frequent Itemset Mining Algorithm

Knowledge Discovery for Sensor Network Comprehension

عنوان ژورنال:

اشتراک گذاری